New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amazon-init NixOS module: fix (I think) race condition with network #22869
Conversation
@copumpkin, thanks for your PR! By analyzing the history of the files in this pull request, we identified @edolstra and @dezgeg to be potential reviewers. |
I ran the VM test 100 times in a row and it didn't fail, but then I ran it again by hand and got the "wrong free space" error 😄 |
Where does that error message even come from? Perl? I don't recognize it. |
restartIfChanged = false; | ||
unitConfig.X-StopOnRemoval = false; | ||
|
||
serviceConfig.Type = "oneshot"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have RemainAfterExit = true
? Might be implicit, I don't remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that buy? I think the main difference is whether systemctl
shows that it's stopped or active? I can add it easily, I just don't understand it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to run the unit multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once per boot I think is the most sensible thing. Unclear to me how RemainAfterExit
will play with the nixos-rebuild switch
in between, but I guess if the config for this service doesn't change due to nixos-rebuild switch
, then perhaps RemainAfterExit
will make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a follow-up to this: semantically, I think it makes sense to think of a unit of this type being in the "started" state to signifiy that the post-condition it establishes holds ("foo is initialized"); if the post-condition continues to hold after MAINPID
exits, it should RemainAfterExit
. On a more practical note, I suppose it's more about avoiding unnecessary work (or, worst-case, clobber state for already running services, in case the init is destructive).
74ca5a5
to
bd39185
Compare
Okay, I added a |
The initialization code is now a systemd service that explicitly waits for network-online, so the occasional failure I was seeing because the `nixos-rebuild` couldn't get anything from the binary cache should stop. I hope!
bd39185
to
b172684
Compare
I'll merge in the next hour if I don't hear anything, because I'm pretty confident that this is a strict improvement in what we had before, and I want to see what Hydra has to say ASAP 😄 TBC, I've run sandboxed VM tests a bajillion times so I'm pretty sure it'll pass, but it seems good at finding things I've missed... |
💥💥💥💥 |
Dammit, http://hydra.nixos.org/build/48803033/nixlog/2 failed. Any ideas? |
I'm confused because that test is clearly waiting for |
tune2fs marks the filesystem as clean to prevent resize2fs from complaining. But we were invoking it before we mounted the filesystem, so the counters would increase to 1 and it broke the functionality. By moving the call after the mount, I have confirmed it works by: $ nix-build nixos/tests/ec2.nix cc @rbvermaa @edolstra
The initialization code is now a systemd service that explicitly waits for network-online, so the occasional failure I was seeing because the
nixos-rebuild
couldn't get anything from the binarycache should stop. I hope!
cc @edolstra @rbvermaa
I did notice that the test occasionally fails with this message:
which I was hoping you might have an idea about. Perhaps I need a
before = [ "sshd.service" ];
field?Motivation for this change
#22830
Things done
(nix.useSandbox on NixOS,
or option
build-use-sandbox
innix.conf
on non-NixOS)
nix-shell -p nox --run "nox-review wip"
./result/bin/
)