Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixosTests: re-enable networking tests #86486

Merged
merged 1 commit into from Aug 28, 2020
Merged

Conversation

flokli
Copy link
Contributor

@flokli flokli commented May 1, 2020

5150378 fixed the long-broken
nixosTests.networking.virtual.

With all tests failures fixed, and #79328 making debugging much easier,
let's re-add it to the tested jobset.

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@andir
Copy link
Member

andir commented May 2, 2020

@ofborg test networking.scripted.link networking.scripted.privacy networking.scripted.routes networking.scripted.virtual networking.networkd.bond networking.networkd.bridge networking.networkd.dhcpOneIf networking.networkd.dhcpSimple networking.networkd.link networking.networkd.loopback networking.networkd.macvlan networking.networkd.privacy networking.networkd.routes networking.networkd.sit networking.networkd.static networking.networkd.virtual networking.networkd.vlan

@andir
Copy link
Member

andir commented May 2, 2020

Some of the tests seem to be flaky. See the aarch64 build results.

@flokli
Copy link
Contributor Author

flokli commented May 4, 2020

Hm, I tried squinting at the logs to spot the error, the only parts I found were nixos-test driver specific failures:

  File "/nix/store/qv7jilbizwv4cz2rbh46hhkyzavwp4bv-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Current thread 0x0000fffff7ff5780 (most recent call first):
/nix/store/rsjp3g6hny6b7m3m6kv753lps0pvdqak-stdenv-linux/setup: line 1271:     6 Aborted                 (core dumped) LOGFILE=$out/log.xml tests='exec(os.environ["testScript"])' /nix/store/5gzzr77gg756pgxdk76wyhjshyfkvcl6-nixos-test-driver-vlan-Networking-Networkd/bin/nixos-test-driver
  File "/nix/store/qv7jilbizwv4cz2rbh46hhkyzavwp4bv-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Current thread 0x0000fffff7ff5780 (most recent call first):
/nix/store/rsjp3g6hny6b7m3m6kv753lps0pvdqak-stdenv-linux/setup: line 1271:     6 Aborted                 (core dumped) LOGFILE=$out/log.xml tests='exec(os.environ["testScript"])' /nix/store/5gzzr77gg756pgxdk76wyhjshyfkvcl6-nixos-test-driver-vlan-Networking-Networkd/bin/nixos-test-driver

I couldn't spot some "networking-specific flakiness in there - it looks more like a generic test-driver flakyness, maybe only happening under high load?

@tfc, any ideas?

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

I rebased on top of 78f2a83, assuming this should fix the observed flakyness.

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

@ofborg test networking.scripted.link networking.scripted.privacy networking.scripted.routes networking.scripted.virtual networking.networkd.bond networking.networkd.bridge networking.networkd.dhcpOneIf networking.networkd.dhcpSimple networking.networkd.link networking.networkd.loopback networking.networkd.macvlan networking.networkd.privacy networking.networkd.routes networking.networkd.sit networking.networkd.static networking.networkd.virtual networking.networkd.vlan

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

Hrm, this still fails:

Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Thread 0x0000fffff52101e0 (most recent call first):
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 93 in eprint
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Thread 0x0000fffff5a111e0 (most recent call first):
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 93 in eprint
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Thread 0x0000fffff62121e0 (most recent call first):
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 93 in eprint
  File "/nix/store/kb785q5ry1s7s9fcvxyba1c6c52w0zha-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 748 in process_serial_output
Current thread 0x0000fffff7ff5780 (most recent call first):
/nix/store/yaw7vxl73i6ii08yqid69mli216b9p28-stdenv-linux/setup: line 1271:     6 Aborted                 (core dumped) LOGFILE=/dev/null tests='exec(os.environ["testScript"])' /nix/store/xmi7nwsf3fidj6pqhkpfnd8bvjrlbskn-nixos-test-driver-Privacy-Networking-Networkd/bin/nixos-test-driver
builder for '/nix/store/vyz7ylzi3iqwmphlzv9nfchax35dlr8f-vm-test-run-Privacy-Networking-Scripted.drv' failed with exit code 134

flokli referenced this pull request May 9, 2020
If a program (e.g. nixos-install) writes more than 1000 lines to
stderr during execute(), then process_serial_output() deadlocks
waiting for the queue to be processed. So use an unbounded queue
instead.

We should probably get rid of the structured log output (log.xml),
since then we don't need the log queue anymore.
@flokli flokli requested a review from tfc as a code owner May 9, 2020 11:25
@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

@ofborg test networking.scripted.link networking.scripted.privacy networking.scripted.routes networking.scripted.virtual networking.networkd.bond networking.networkd.bridge networking.networkd.dhcpOneIf networking.networkd.dhcpSimple networking.networkd.link networking.networkd.loopback networking.networkd.macvlan networking.networkd.privacy networking.networkd.routes networking.networkd.sit networking.networkd.static networking.networkd.virtual networking.networkd.vlan

@flokli
Copy link
Contributor Author

flokli commented May 9, 2020

Even when removing all the queue stuff, I still got the

Fatal Python error: could not acquire lock for <_io.BufferedWriter name=''> at interpreter shutdown, possibly due to daemon threads

So this seems to come from process_serial_output being in a separate thread, and the underlying print to stderr.

With all the xml/html log output gone since #87191, I'll give it a try to rework this to make use of Pythons native logging framework, which is supposed to be thread safe.

@flokli
Copy link
Contributor Author

flokli commented May 22, 2020

Converting to draft until the discussion following #87191 (comment) has been done.

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having read the threads in this PR, #86889 and #87191 I think that it's preferable to have a stable test-driver rather than waiting until we've reached consensus about how $out and logging should look like.

IMHO we can still re-add lost features later on. I may be rather unlucky, but I regularly experience frozen VM-tests (that get "fixed" by restarting them) and right now I have a simple VM using grafana and loki (based on the test-driver) that reproducibly breaks when trying to shut it down (with the error demonstrated in #86889 ) and I think that it's more important to get those kind of (known) issues under control.

@flokli
Copy link
Contributor Author

flokli commented Jul 11, 2020

Yeah, I agree. Feel free to take over this PR - I can't currently pursue this.

@flokli flokli closed this Jul 11, 2020
@Ma27
Copy link
Member

Ma27 commented Jul 12, 2020

I'm sorry, I don't want to start another time-consuming task here, I should get a lot of other NixOS-related stuff I'm hacking on done first 😅

If anyone who subscribed to this thread wants to take over, I'd be fairly grateful!

@flokli flokli mentioned this pull request Aug 24, 2020
10 tasks
@Mic92
Copy link
Member

Mic92 commented Aug 25, 2020

#96254

5150378 fixed the long-broken
nixosTests.networking.virtual.

With all tests failures fixed, and NixOS#79328 making debugging much easier,
let's re-add it to the tested jobset.
@flokli
Copy link
Contributor Author

flokli commented Aug 27, 2020

With #96254 being merged, I reopened this, dropped the wip refactor now done in #96254, and rebased to latest master.

@flokli flokli marked this pull request as ready for review August 27, 2020 10:18
@flokli flokli requested a review from Ma27 August 27, 2020 10:37
@andersk
Copy link
Contributor

andersk commented Aug 31, 2020

networking.networkd.macvlan is failing most of the time: #96709

@andersk
Copy link
Contributor

andersk commented Aug 31, 2020

Also #96254 was just reverted (in #96703, to fix #96699).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants