[RFC] nixpkgs/tests: create nixos-independent PoC tests #36842

7c6f434c · 2018-03-12T02:06:06Z

for libreoffice, maxima, texlive, imagemagick, zathura, evince, youtube-dl

Motivation for this change

Some things in Nixpkgs are testable via NixOS test, which is good.

But NixOS tests are an integration test framework for an OS. So they need a system path with a lot of things, a long list of system services, require /dev/kvm access, boot a full NixOS instance which is not instant — especially in KVM, have significant overhead even afterwards and a lot of sources of non-determinism and hard-to-track state transitions.

Also, they currently only support keyboard input.

What I present is a sample of how simple minimally useful tests «nothing obvious is broken» can be.

Highlights:

Cheap; if I need X I just run xdummy, if I don't — even better.
Self-contained: if you see something working in a sandbox test without VMs, you can always reproduce it inside a proper combination of namespace jails, and probably you can just run it in a nix-shell.
Mouse-compatible: the recently added fmbt has an acceptable interface for OCR-driven mouse-based GUI testing.
Transparent: I run few enough processes that I don't need to put random sleeps to let things settle. I know what I am waiting for and I can poll its state directly.
Can test application-level integration: see a test that checks that Zathura can open pdfLaTeX output.
Meta.
There are some brief comments about the intent and planned evolution of the code.

Not done — more examples:

Any abstraction. Current version of abstraction is still somewhat weird, should be replaced after we have more tests.
Any support for a test suite as a coherent whole. Well, you can build the entire default.nix.
Connecting packages (and especially libraries) and tests. Realistically, maxima.nix is not a test for Maxima — it is a test for SBCL. How this should be reflected? This is a large an complicated topic, and NixOS tests could be the first goal anyway.
Non-default pre-processing for OCR: right now I use a huge DPI or zoom on a small screen geometry precisely to make sure that a character is not just an edge, as edge detection mangles single-pixel-wide characters too much to recognise.

Things done

Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
Built on platform(s)
- almost NixOS
~~Tested via one or more NixOS test(s) if existing and applicable for the change~~ Emphatically not applicable

grahamc · 2018-03-12T13:47:59Z

cc @Profpatsch

markuskowa · 2018-03-12T22:14:08Z

Would be possible to add a convenience function that tests simply if all binaries (like the checkpoint in the PR template) in $out/bin can be executed? For example let's say this function would check if the executable terminates without a segfault or dynamic linker errors.

7c6f434c · 2018-03-12T22:34:39Z

Sure. Added a bit of stuff: a combined test of pdflatex + zathura, and a test that everything in coreutils except false and test reacts to --version with a non-empty string.

7c6f434c · 2018-03-12T22:40:40Z

Updated the description because of an additional couple of tests.

Profpatsch · 2018-03-13T00:15:54Z

@7c6f434c have you seen my talk from Nixcon?
I should be able to finally push the code, not that a lot has changed by I didn’t have the time up until now.

markuskowa · 2018-03-13T05:51:03Z

The check-all-executables functions might even have a check mode that is even more general and check for special return codes that indicate a fatal error (126-165, http://www.tldp.org/LDP/abs/html/exitcodes.html). The motivation behind that is the fact, that once in while I have come across packages that build but crash with a segfault or a linker error.

7c6f434c · 2018-03-13T07:19:55Z

@markuskowa sure, that's easy.

A marginally more interesting question is how to make sure the program doesn't have a behaviour where it waits for an event to react without needing either input or output, a test like the current ones will just hang.

On the other hand, there are also commands that will exit with 127 when passed --help, so my current PR is a PoC for easily creating cheap tests according to human judgement, not for writing a test that Must Always Pass For Every Package.

7c6f434c · 2018-03-13T07:34:56Z

@Profpatsch I don't normally watch video, but yes, I have read your slides before.

In a way, this PR is orthogonal in a way: you want a way to run tests, I want a way to create them. You were willing to agree with VM tests, I want to avoid their overhead in a hope that faster tests are easier to debug and run, so there will be more of them. Imagine an auto-upgrade script that will be able claim «In 2017, half of the upgrades got strictly less testing — manual testing — than the amount we run automatically».

I am not sure I completely believe in full feasibility of your maxim re: locality and unavoidability of tests, because I want to have tests like TeXLive + Zathura that cannot be ascribed to any single package, and an interoperability test is hard to assign to any one side (and in some other cases can also increase build closure beyond reason, even if Zathura happens to depend on TeXLive in this specific case).

7c6f434c · 2018-03-13T07:37:48Z

Added a note about the current need for large fonts to make OCR work.

markuskowa · 2018-03-13T07:56:56Z

@7c6f434c Sure there needs to be some room for exceptions. Using --version as a standard argument is a good default but not all executables will understand it. I agree that this simple execution test needs to be decided on case-by-case basis. I will try to come up with a refinement.

7c6f434c · 2018-03-13T18:02:13Z

@markuskowa if you look inside the implementation, you'll see there is — I had to ban false and test from version check, after all. I have now added some support for non-special-ness of the exit codes.

Profpatsch

Overall, I’d recommend a very conservative approach to this completely new API; rather than expose everything from the beginning, first keep it local and only expose once there’s two or more usages of the same pattern.

Also, documentation, docstrings for everything and manual entries before it can be merged.

Profpatsch · 2018-03-13T18:23:02Z

tests/lib/default.nix

@@ -0,0 +1,62 @@
+{ pkgs }:
+rec {


Every part of this attribute will be the public testing API as soon as it gets merged and will have to be supported for probably all eternity.
Big -1 from me for the current exposed symbols, there needs to be a lot more consensus first, and new (non-internal) symbols should only be introduced with care and properly documented.

As I said, abstraction is currently done wrong, because there are not enough tests to generalise from. The comment-gathering re: what kinds of tests can be done seems to happen somehow.

Consensus won't happen anyway, but right now even consistency will be a step ahead.

What I did miss is that I haven't added [RFC] and [WIP] tags in the name to clarify the intent; fixed that.

Profpatsch · 2018-03-13T18:26:27Z

tests/data/latex-file.tex

+Test LaTeX content
+
+\end{document}
+


Sorting the tests into folders might be more sensible, since test data can normally not be shared;

tests/ zathura-latex/ latex-file.tex default.nix

Once there are hundreds of tests, this will help with naming files.

Yes, sounds good.

Actually, maybe having this test as a combined test was a mistake anyway.

Separated tests for generation and viewing of PDFs, created some folders

markuskowa · 2018-03-13T19:21:06Z

@7c6f434c Thanks, it looks pretty neat now. I will give checkAllExecutables a try with some of my derivations.

7c6f434c · 2018-03-13T19:53:27Z

Re: two instances of the same pattern: unfortunately, most of the X stuff gets repeated even with two tests…

I do agree that the current API doesn't exist, neither is there any documentation of this non-existent object. This has to be fixed before merging. Still, I don't think Nixpgs manual entries should be added at this step, because of their high visibility — some period of use by early adopters should happen before advertising something in the manual.

7c6f434c · 2018-03-14T01:09:15Z

Updated description to reflect a slightly wider variety. Suggestions welcome, implementation of suggestions not guaranteed.

…xima

Change layout a bit Separate tests for PDF generation and tests for PDF viewing; the latter use the outputs of the former.

edolstra · 2018-03-21T13:00:26Z

@wmertens Except that the tests proposed here are system/integration tests, exactly like the existing VM infrastructure, including running X servers, doing OCR on output, running system daemons like dbus, etc. We're not talking about running make check which is already covered by checkPhase.

7c6f434c · 2018-03-21T13:08:42Z

Application integration tests provide only what application needs to run, not what NixOS needs to boot. The difference is a few times, with corresponding difference in speed and debugging annoyance.

shlevy · 2018-03-29T13:42:04Z

@edolstra Thoughts on where/how to address this? Or do you think the existing NixOS tests are adequate?

joepie91 · 2018-06-09T15:35:08Z

To pitch in: a short testing feedback loop is absolutely critical if you want to encourage a test-driven packaging process. Test durations should ideally be measured in seconds, and be short enough that they don't (mentally) interrupt whatever fixing work you're trying to do. It should be a matter of 'open terminal, run command, look at output, go back to fixing'.

The current bootstrapping time for any system test is simply too high to usefully test application functionality.

If we don't want deduplication, then perhaps we should look into possible ways to deduplicate the testing functionality between system tests and these proposed application integration tests; but that application integration tests are necessary as a third category is not really in question.

(EDIT: To be clear, I'm not saying that the system tests are not useful. But "test that everything works in the context of a NixOS system" and "test that particular functionality works when combined with particular other applications" are two distinctly different usecases, with different optimal tradeoffs.)

nyarly · 2018-06-27T16:26:34Z

To throw in a completely different angle, I found https://github.com/NixOS/nixpkgs/blob/c4fc70f53cfaf673bde7fd009826d62bececa161/pkgs/development/ruby-modules/bundled-common/test.nix incredibly useful. Maybe that's because what I was writing was (too?) complex, but testing the actual Nix functions themselves was extremely quick, a very fast feedback cycle.

There are some (underappreciated, imo) testing utilities in lib/, but having something more robust and more widely accepted would be nifty.

timokau · 2018-07-14T11:38:07Z

@edolstra Is there something we (I) can do to make this experiment acceptable? It has many merits and apparently at least some community support. It also doesn't add very much complexity and can be seen as a community experiment for now. We could then schedule a re-visit of the matter after some time.

Or is this BDFL-blocked in any form?

grahamc · 2018-07-14T12:34:33Z

@timokau In these replies he hasn't said package tests are bad or undesirable, and provided feedback on ways to approach it which would be acceptable.

from #36842 (comment):

Duplicating the VM testing infrastructure in a crappy language (bash) that's significantly worse to maintain or write correct code in than Perl doesn't seem like a good idea to me.

Don't duplicate the functionality, especially in a pretty crappy language

Package tests should be kept close to the Nix expression for the package.

put the tests near the package

from #36842 (comment):

Closing this because having two testing infrastructures that do almost exactly the same thing but without sharing any code is really not something we should do.

make sure the package tests are differentiated enough that they're not a VM test in ad-hoc bash clothes

7c6f434c · 2018-07-14T20:42:56Z

@grahamc as I mentioned, this set of change requirements is a permanent veto.

Rewriting in Python might be a good idea, even if Bash is still better than Perl (we do use Bash for wrappers even if they have Perl somewhere in the closure).

But:

Putting tests near the packages is not always a thing that can be defined.
GUI apps require launching an X session; so that it always duplicated functionality.
A full-VM NixOS test takes around a minute to start even if it just runs hello

If we want to avoid duplicating functionality, we need a NixOS profile (an empty one) that can be launched in Nix sandbox on non-NixOS Linux with full-sandboxing Nix in 3 seconds or less. Then we will add Xdummy etc. and get NixOS test that are usable for testing applications in minimal environment, not just to test the complicated and expensive NixOS features.

timokau · 2018-07-15T20:23:13Z

How about we split this up. Parts of this may be uncontroversial and can be merged. Other parts could then be discussed in focus.

We could start with just the command line stuff. Maybe the binary checks in particular. I imagine those might be uncontroversial since

they are recommended in the current issue template and should be automated. They already are executed in an automated fashion by @r-ryantm and this is an opportunity to make that universal
they don't duplicate any functionality. It would definitely not make any sense to spin up a VM for that. You could implement that functionality as part of the regular installChecks, but that would require an unnecessary rebuild of all packages every time a test is changed.
no X functionality needed
they provide clear value right now
they make sense to implement in bash and little code is required in general

So we could start with that. After we're done with that, we can talk about other parts of this PR, in separation.

@grahamc @edolstra do you think that is a good idea?
@7c6f434c do you want to do that? If not, do you want me to do that?

7c6f434c · 2018-07-15T20:46:12Z

- they are recommended in the current issue template and should be automated. They already are executed in an automated fashion by @r-ryantm and this is an opportunity to make that universal

Well, they do require some judgement as recommended and as done by the update bot. I have _some_ as examples in the PR; I probably want to see a list of the checks you propose to make available. Also, where it should live? I dunno, build-support?

- they provide clear value right now

I think for the, to provide value we need some clear scheme with some flexibility Possible options: * exits with 0, exits with 0–k code, in particular: exits with 0–126 code (non-signal exit) * started with no arguments, started with --help, started with --version * prints something, prints at least/at most N lines, prints something that includes the version of the package * all, at least one, all but a known list, just in a known list Which do we want? Would it be too heavy for bash if we implement all? For Python?

timokau · 2018-07-15T21:05:09Z

>- they are recommended in the current issue template and should be automated. They already are executed in an automated fashion by @r-ryantm and this is an opportunity to make that universal Well, they do require some judgement as recommended and as done by the update bot.

Yes we will still need to customize that. Make it possible to overwrite which binary should respond to which flag with which exit code / output. But we should be able to provide sane defaults (exit code 0 to --help? @ryantm could probably help with the details). Then we would have to decide weather to make that opt-in or opt-out.

I have _some_ as examples in the PR; I probably want to see a list of the checks you propose to make available.

I'd say pretty much exactly what your current `check-executables.nix` does.

Also, where it should live? I dunno, build-support?

I think `tests` makes sense. But `build-support` is fine too. Thats just details.

>- they provide clear value right now I think for the, to provide value we need some clear scheme with some flexibility Possible options: * exits with 0, exits with 0–k code, in particular: exits with 0–126 code (non-signal exit) * started with no arguments, started with --help, started with --version * prints something, prints at least/at most N lines, prints something that includes the version of the package * all, at least one, all but a known list, just in a known list

Yes, those are defaults we should decide on in that PR. Probably best to start really simple: "`--help` exists 0?" and go from there.

Which do we want? Would it be too heavy for bash if we implement all? For Python?

We could just as well implement that with python. Although for passing arguments to command line tools and checking their exit code, python is probably more clumsy than bash. It'd work though.

7c6f434c · 2018-07-15T21:16:50Z

> Also, where it should live? I dunno, build-support? I think `tests` makes sense. But `build-support` is fine too. Thats just details.

Well, /tests might have bad luck, especially as it doesn't test anything.

We could just as well implement that with python. Although for passing arguments to command line tools and checking their exit code, python is probably more clumsy than bash. It'd work though.

I think we need to implement _some_ customisation from the beginning (so my question is which options and how we want to customise). Implementation options are bash, python and generating trivial bash by nontrivial Nix. Pythong might be better for handling per-executable policies, but it is an extra dependency and yes it might be a bit more verbose for simple option passing.

timokau · 2018-07-16T12:20:24Z

On 18-07-15 14:16, Michael Raskin wrote: >> Also, where it should live? I dunno, build-support? >I think `tests` makes sense. But `build-support` is fine too. Thats just details. Well, /tests might have bad luck, especially as it doesn't test anything.

I don't understand that point. Of course it tests something? I would imagine the "infrastructure" and defaults to live in `tests` and the per-package customization in the per-package directory (following "tests close to source").

>We could just as well implement that with python. Although for passing arguments to command line tools and checking their exit code, python is probably more clumsy than bash. It'd work though. I think we need to implement _some_ customisation from the beginning (so my question is which options and how we want to customise).

Again, I think @ryantm probably has some insights here. We might start by manually adding tests to a few packages as an experiment (opt-in). Then maybe a trust-on-first-use-esque principle could be implemented for @r-ryantm. It could execute its binary tests like it does now and then add tests for the ones that pass.

Implementation options are bash, python and generating trivial bash by nontrivial Nix. Pythong might be better for handling per-executable policies, but it is an extra dependency and yes it might be a bit more verbose for simple option passing.

`check-executables.nix` seems to use the second option and is perfectly fine in my opinion. We could of course switch implementation language if that seems to make sense at a later point -- its just a couple of lines.

ryantm · 2018-07-16T13:27:36Z

@timokau There is a pretty big difference between a procedure that provides advisory information about a build and one that stops a build from completing, so I'm not sure the nixpkgs-update binary checking will be that interesting for this.

nixpkgs-update tries to execute every file in the bin directory with a bunch of different inputs, and it records how many files can get a non-zero exit code (with those inputs) and how many can output the new version string (with those inputs). I've learned I need to run the binaries with a strict timeout, but still there are a few I've had to blacklist because they get stuck during checking.

timokau · 2018-07-16T13:42:47Z

On 18-07-16 06:27, Ryan Mulligan wrote: @timokau There is a pretty big difference between a procedure that provides advisory information about a build and one that stops a build from completing, so I'm not sure the nixpkgs-update binary checking will be that interesting for this. nixpkgs-update tries to execute every file in the `bin` directory with a [bunch of different inputs](https://github.com/ryantm/nixpkgs-update/blob/5cf1bc67aa1165ab44df1091df84107688d4253e/src/Check.hs#L54), and it records how many files can get a non-zero exit code (with those inputs) and how many can output the new version string (with those inputs). I've learned I need to run the binaries with a [strict timeout](https://github.com/ryantm/nixpkgs-update/blob/5cf1bc67aa1165ab44df1091df84107688d4253e/src/Check.hs#L43), but still there are a few I've had to blacklist because they [get stuck during checking](https://github.com/ryantm/nixpkgs-update/blob/5cf1bc67aa1165ab44df1091df84107688d4253e/src/Blacklist.hs#L97).

Thanks for your input! I think the only difference is that currently the bot just runs all tests on all binaries. The checks talked about here would have to be a bit more selective.

timokau · 2018-07-20T15:00:40Z

Since this is stalling again, I'll probably start an initial implementation soon if nobody objects.

nixos-discourse · 2019-10-10T10:20:14Z