Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/statsd: refactor test #40554

Merged
merged 1 commit into from May 16, 2018
Merged

nixos/statsd: refactor test #40554

merged 1 commit into from May 16, 2018

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented May 15, 2018

Motivation for this change

statsd is a daemon written in node to gather statistics over UDP.
The current test ensures that a port is open, but the basic
functionality isn't sufficiently tested.

This patch contains the following changes:

  • Simplified port scanning (waitForOpenPort rather than netcat magic).

  • Issue a TCP command to check the health of the statsd server.

  • Simple script to check if statsd receives data over UDP and confirms
    the basic functionality of the TCP interface on 8126 for admin
    purposes.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

Copy link
Contributor

@xeji xeji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This makes the test much more useful.

$machine->waitForOpenPort(8126);

# check state of the `statsd` server
$machine->succeed('[ "health: up" = "$(echo health | nc 127.0.0.1 8126 -w 1)" ];');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short -w 1 timeout may cause non-deterministic failures on Hydra with high system load.
I would set a much longer timeout such as 120s. Same for the other TCP requests below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this would be simpler, easier to read and more consistent with the rest:

$machine->succeed("echo health | nc -w 120 127.0.0.1 8126 -w 120 | grep -q 'health: up' ");

@Ma27
Copy link
Member Author

Ma27 commented May 15, 2018

@xeji increased the timeouts to 120 minutes (-w 120), thanks for your feedback %)

Copy link
Contributor

@xeji xeji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The netcat options are not yet correct. Sorry I missed some things before (we recently switched master from openbsd-netcat to a version bundled with libressl, which has slightly different options and behaves differently, and I checked locally with the old version).

If everything is correct, the test should finish quickly without extra waits. At least it does on my local machine...

$machine->waitForOpenPort(8126);

# check state of the `statsd` server
$machine->succeed('[ "health: up" = "$(echo health | nc 127.0.0.1 8126 -w 120)" ];');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add -N for all TCP connections (but not for UDP) so the connection is closed after EOF on input. Otherwise this will wait 120s even after a successful connection, and the test takes forever.

# confirm basic examples for metrics derived from docs:
# https://github.com/etsy/statsd/blob/v0.8.0/README.md#usage and
# https://github.com/etsy/statsd/blob/v0.8.0/docs/admin_interface.md
$machine->succeed("echo 'foo:1|c' | nc -u -w 120 127.0.0.1 8125");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this one is UDP, you actually need -w 0 here. (but not -N)

# https://github.com/etsy/statsd/blob/v0.8.0/README.md#usage and
# https://github.com/etsy/statsd/blob/v0.8.0/docs/admin_interface.md
$machine->succeed("echo 'foo:1|c' | nc -u -w 120 127.0.0.1 8125");
$machine->succeed("echo counters | nc -w 120 127.0.0.1 8126 | grep foo");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and everything below) is TCP again. Add -N.

`statsd` is a daemon written in `node` to gather statistics over UDP.
The current test ensures that a port is open, but the basic
functionality isn't sufficiently tested.

This patch contains the following changes:

* Simplified port scanning (`waitForOpenPort` rather than `netcat` magic).

* Issue a TCP command to check the health of the `statsd` server.

* Simple script to check if `statsd` receives data over UDP and confirms
  the basic functionality of the TCP interface on `8126` for admin
  purposes.
@Ma27
Copy link
Member Author

Ma27 commented May 16, 2018

@xeji yeah it doesn't wait as well on my side. Thanks for your help!

@xeji
Copy link
Contributor

xeji commented May 16, 2018

@GrahamcOfBorg test statsd

@GrahamcOfBorg
Copy link

No attempt on aarch64-linux (full log)

The following builds were skipped because they don't evaluate on aarch64-linux: tests.statsd

Partial log (click to expand)


a) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnfree = true; }
in configuration.nix to override this.

b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnfree = true; }
to ~/.config/nixpkgs/config.nix.


@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: tests.statsd

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 14.21s
cleaning up
killing machine (pid 593)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/z90dlkgm8mrp13qixhi36ym14lkkz9sy-vm-test-run-statsd

@xeji
Copy link
Contributor

xeji commented May 16, 2018

14 seconds - looks good 😃 - thanks for improving this test!

@xeji xeji merged commit 70d64d1 into NixOS:master May 16, 2018
@Ma27 Ma27 deleted the statsd/refactor-test branch May 16, 2018 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants