Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/acme: fix subjectAltName in test snakeoil certs #96149

Merged
merged 2 commits into from Aug 31, 2020

Conversation

JJJollyjim
Copy link
Member

Motivation for this change

nixosTests.acme has been broken since the bump to Go 1.15, as go's https client now requires subjectAltName in servers' certificates, and the snakeoil cert we were generating only had a CN, thanks to openssl's command line footgun interface. The generator script has been fixed and the certs regenerated.

Additionally, lego's passthru.tests has been set, since it was lacking before :)

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@JJJollyjim
Copy link
Member Author

@flokli
Copy link
Contributor

flokli commented Aug 27, 2020

Hmm, this still seems to fail for me:

webserver # [   20.170562] nixos[882]: finished switching to system configuration /nix/store/10c0lwmrh31p9ff14k4ss50y2iw8nzyd-nixos-system-webserver-20.09pre-git
(7.45 seconds)
client: must succeed: curl --cacert /tmp/ca.crt https://b.example.test/ | grep -qF 'hello world'
client #   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
client #                                  Dload  Upload   Total   Spent    Left  Speed
client #   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
client # curl: (60) SSL certificate problem: self signed certificate in certificate chain
client # More details here: https://curl.haxx.se/docs/sslcerts.html
client #
client # curl failed to verify the legitimacy of the server and therefore could not
client # establish a secure connection to it. To learn more about this situation and
client # how to fix it, please visit the web page mentioned above.
client: output:
Test "Can add another certificate for nginx service" failed with error: "command `curl --cacert /tmp/ca.crt https://b.example.test/ | grep -qF 'hello world'` failed (exit code 1)"
error:
Traceback (most recent call last):
  File "/nix/store/yvccmvprvrds4dvahgy7ziava0bs2h38-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 897, in run_tests
    exec(tests, globals())
  File "<string>", line 1, in <module>
  File "<string>", line 40, in <module>
  File "/nix/store/yvccmvprvrds4dvahgy7ziava0bs2h38-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 421, in succeed
    raise Exception(
Exception: command `curl --cacert /tmp/ca.crt https://b.example.test/ | grep -qF 'hello world'` failed (exit code 1)
cleaning up
killing acme (pid 33)
killing acmeStandalone (pid 66)
killing client (pid 9)
killing dnsserver (pid 21)
killing webserver (pid 86)
(0.00 seconds)
builder for '/nix/store/kswx12850q147nkrl02d6qyhzcraafn6-vm-test-run-acme.drv' failed with exit code 1
error: build of '/nix/store/kswx12850q147nkrl02d6qyhzcraafn6-vm-test-run-acme.drv' failed

@JJJollyjim
Copy link
Member Author

@flokli that's very odd... that's a new error i haven't seen before.

It passes for me on arch or nixos and on ofborg

@JJJollyjim
Copy link
Member Author

JJJollyjim commented Aug 27, 2020

@flokli your test run has a different store path to mine (/nix/store/vlpxr871i3sr7d152z5vbb3x75vpswdh-vm-test-run-acme.drv). Is there something different about how you're running it?

@JJJollyjim
Copy link
Member Author

(actually weirdly mine doesn't match ofborg... not sure how that happens, we're on the same commit)

@JJJollyjim
Copy link
Member Author

I wonder if there could be a race condition or transient failure where these lines:

      client.succeed("curl https://acme.test:15000/roots/0 > /tmp/ca.crt")
      client.succeed("curl https://acme.test:15000/intermediate-keys/0 >> /tmp/ca.crt")

Are producing the wrong output? curl doesn't have --fail so potentially they could be 404s or something

@JJJollyjim
Copy link
Member Author

Is anyone else able to reproduce this? Or @flokli are you able to investigate?

@m1cr0man
Copy link
Contributor

Is anyone else able to reproduce this? Or @flokli are you able to investigate?

I had the same thing happen when working on #91121, and I can't narrow down what I did to fix it..

You could try adding acme.wait_for_unit("network.target") before the curl commands?

@m1cr0man
Copy link
Contributor

I've started encountering the same issue as above in #91121, despite my wait_for_unit trick and also using webserver.succeed("sync"). I believe the reason is that calling systemctl reload nginx does not imply that the config has been reloaded when it finishes, since all it does is send SIGHUP.

As explained here, a SIGHUP will cause the master process to delegate restarts of the worker processes. My bet is that this certainly takes longer than the time it takes the test scripts to go from reload -> run curl. Apachectl also sends signals, so it will not accurately block until the service is actually reloaded.

I'm not sure what the best solution is here. Adding a simple sleep would certainly solve the problem, but other than that I can only think of things like watching file descriptors on sockets or waiting for PIDs to change, which I don't know how to do cleanly.

@m1cr0man
Copy link
Contributor

I came up with a solution for the above problem in the other PR if anyone here is interested :)

@JJJollyjim
Copy link
Member Author

Can we merge this for now and fix the nondeterminism later?

@arianvp
Copy link
Member

arianvp commented Aug 31, 2020 via email

@m1cr0man
Copy link
Contributor

Yeah grand. All the more pressure on people to merge #91121 too 😉

@arianvp arianvp merged commit 882ed67 into NixOS:master Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants