Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amazon-init NixOS module: fix (I think) race condition with network #22869

Merged
merged 1 commit into from Feb 16, 2017

Conversation

copumpkin
Copy link
Member

The initialization code is now a systemd service that explicitly waits for network-online, so the occasional failure I was seeing because the nixos-rebuild couldn't get anything from the binary
cache should stop. I hope!

cc @edolstra @rbvermaa

I did notice that the test occasionally fails with this message:

wrong free space 2079145984 at (eval 15) line 53, <__ANONIO__> line 577.

which I was hoping you might have an idea about. Perhaps I need a before = [ "sshd.service" ]; field?

Motivation for this change

#22830

Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@mention-bot
Copy link

@copumpkin, thanks for your PR! By analyzing the history of the files in this pull request, we identified @edolstra and @dezgeg to be potential reviewers.

@copumpkin
Copy link
Member Author

I ran the VM test 100 times in a row and it didn't fail, but then I ran it again by hand and got the "wrong free space" error 😄

@edolstra
Copy link
Member

Where does that error message even come from? Perl? I don't recognize it.

restartIfChanged = false;
unitConfig.X-StopOnRemoval = false;

serviceConfig.Type = "oneshot";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have RemainAfterExit = true? Might be implicit, I don't remember.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that buy? I think the main difference is whether systemctl shows that it's stopped or active? I can add it easily, I just don't understand it 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to run the unit multiple times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once per boot I think is the most sensible thing. Unclear to me how RemainAfterExit will play with the nixos-rebuild switch in between, but I guess if the config for this service doesn't change due to nixos-rebuild switch, then perhaps RemainAfterExit will make a difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a follow-up to this: semantically, I think it makes sense to think of a unit of this type being in the "started" state to signifiy that the post-condition it establishes holds ("foo is initialized"); if the post-condition continues to hold after MAINPID exits, it should RemainAfterExit. On a more practical note, I suppose it's more about avoiding unnecessary work (or, worst-case, clobber state for already running services, in case the init is destructive).

@grahamc
Copy link
Member

grahamc commented Feb 16, 2017

@copumpkin
Copy link
Member Author

Okay, I added a before = [ "sshd.service" ]; which I think fixes the race with the disk resize check. Should be mergeable now. At some point I'll fix up the test to more explicitly test slow network setup by introducing artificial delay in networking (by forcing some sort of sleep as a dependency of network-pre or something like that), but this should be strictly better than before now.

The initialization code is now a systemd service that explicitly
waits for network-online, so the occasional failure I was seeing
because the `nixos-rebuild` couldn't get anything from the binary
cache should stop. I hope!
@copumpkin
Copy link
Member Author

@joachifm @edolstra okay, I added RemainAfterExit = true since it doesn't seem like it'll hurt anything and is probably right. We okay to merge?

@copumpkin
Copy link
Member Author

I'll merge in the next hour if I don't hear anything, because I'm pretty confident that this is a strict improvement in what we had before, and I want to see what Hydra has to say ASAP 😄

TBC, I've run sandboxed VM tests a bajillion times so I'm pretty sure it'll pass, but it seems good at finding things I've missed...

@copumpkin
Copy link
Member Author

💥💥💥💥

@copumpkin copumpkin merged commit 19a9099 into NixOS:master Feb 16, 2017
@copumpkin
Copy link
Member Author

Dammit, http://hydra.nixos.org/build/48803033/nixlog/2 failed. Any ideas?

@copumpkin
Copy link
Member Author

I'm confused because that test is clearly waiting for sshd.service here, and I've forced my nixos-rebuild service to run before (in the systemd sense) sshd.service, but somehow our waitForUnit("sshd.service") passes with no sign of nixos-rebuild in the log. Am I misusing/misunderstanding systemd unit ordering somehow?

copumpkin referenced this pull request Feb 24, 2017
tune2fs marks the filesystem as clean to prevent resize2fs from
complaining.

But we were invoking it before we mounted the filesystem, so the
counters would increase to 1 and it broke the functionality.

By moving the call after the mount, I have confirmed it works by:

   $ nix-build nixos/tests/ec2.nix

cc @rbvermaa @edolstra
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants