Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] nixpkgs/tests: create nixos-independent PoC tests #36842

Closed
wants to merge 11 commits into from

Conversation

7c6f434c
Copy link
Member

@7c6f434c 7c6f434c commented Mar 12, 2018

for libreoffice, maxima, texlive, imagemagick, zathura, evince, youtube-dl

Motivation for this change

Some things in Nixpkgs are testable via NixOS test, which is good.

But NixOS tests are an integration test framework for an OS. So they need a system path with a lot of things, a long list of system services, require /dev/kvm access, boot a full NixOS instance which is not instant — especially in KVM, have significant overhead even afterwards and a lot of sources of non-determinism and hard-to-track state transitions.

Also, they currently only support keyboard input.

What I present is a sample of how simple minimally useful tests «nothing obvious is broken» can be.

Highlights:

  • Cheap; if I need X I just run xdummy, if I don't — even better.
  • Self-contained: if you see something working in a sandbox test without VMs, you can always reproduce it inside a proper combination of namespace jails, and probably you can just run it in a nix-shell.
  • Mouse-compatible: the recently added fmbt has an acceptable interface for OCR-driven mouse-based GUI testing.
  • Transparent: I run few enough processes that I don't need to put random sleeps to let things settle. I know what I am waiting for and I can poll its state directly.
  • Can test application-level integration: see a test that checks that Zathura can open pdfLaTeX output.
  • Meta.
  • There are some brief comments about the intent and planned evolution of the code.

Not done — more examples:

  • Any abstraction. Current version of abstraction is still somewhat weird, should be replaced after we have more tests.
  • Any support for a test suite as a coherent whole. Well, you can build the entire default.nix.
  • Connecting packages (and especially libraries) and tests. Realistically, maxima.nix is not a test for Maxima — it is a test for SBCL. How this should be reflected? This is a large an complicated topic, and NixOS tests could be the first goal anyway.
  • Non-default pre-processing for OCR: right now I use a huge DPI or zoom on a small screen geometry precisely to make sure that a character is not just an edge, as edge detection mangles single-pixel-wide characters too much to recognise.
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • almost NixOS
  • Tested via one or more NixOS test(s) if existing and applicable for the change Emphatically not applicable

@grahamc
Copy link
Member

grahamc commented Mar 12, 2018

cc @Profpatsch

@markuskowa
Copy link
Member

Would be possible to add a convenience function that tests simply if all binaries (like the checkpoint in the PR template) in $out/bin can be executed? For example let's say this function would check if the executable terminates without a segfault or dynamic linker errors.

@7c6f434c
Copy link
Member Author

Sure. Added a bit of stuff: a combined test of pdflatex + zathura, and a test that everything in coreutils except false and test reacts to --version with a non-empty string.

@7c6f434c
Copy link
Member Author

Updated the description because of an additional couple of tests.

@Profpatsch
Copy link
Member

Profpatsch commented Mar 13, 2018

@7c6f434c have you seen my talk from Nixcon?
I should be able to finally push the code, not that a lot has changed by I didn’t have the time up until now.

@markuskowa
Copy link
Member

The check-all-executables functions might even have a check mode that is even more general and check for special return codes that indicate a fatal error (126-165, http://www.tldp.org/LDP/abs/html/exitcodes.html). The motivation behind that is the fact, that once in while I have come across packages that build but crash with a segfault or a linker error.

@7c6f434c
Copy link
Member Author

@markuskowa sure, that's easy.

A marginally more interesting question is how to make sure the program doesn't have a behaviour where it waits for an event to react without needing either input or output, a test like the current ones will just hang.

On the other hand, there are also commands that will exit with 127 when passed --help, so my current PR is a PoC for easily creating cheap tests according to human judgement, not for writing a test that Must Always Pass For Every Package.

@7c6f434c
Copy link
Member Author

@Profpatsch I don't normally watch video, but yes, I have read your slides before.

In a way, this PR is orthogonal in a way: you want a way to run tests, I want a way to create them. You were willing to agree with VM tests, I want to avoid their overhead in a hope that faster tests are easier to debug and run, so there will be more of them. Imagine an auto-upgrade script that will be able claim «In 2017, half of the upgrades got strictly less testing — manual testing — than the amount we run automatically».

I am not sure I completely believe in full feasibility of your maxim re: locality and unavoidability of tests, because I want to have tests like TeXLive + Zathura that cannot be ascribed to any single package, and an interoperability test is hard to assign to any one side (and in some other cases can also increase build closure beyond reason, even if Zathura happens to depend on TeXLive in this specific case).

@7c6f434c
Copy link
Member Author

Added a note about the current need for large fonts to make OCR work.

@markuskowa
Copy link
Member

@7c6f434c Sure there needs to be some room for exceptions. Using --version as a standard argument is a good default but not all executables will understand it. I agree that this simple execution test needs to be decided on case-by-case basis. I will try to come up with a refinement.

@7c6f434c
Copy link
Member Author

@markuskowa if you look inside the implementation, you'll see there is — I had to ban false and test from version check, after all. I have now added some support for non-special-ness of the exit codes.

Copy link
Member

@Profpatsch Profpatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I’d recommend a very conservative approach to this completely new API; rather than expose everything from the beginning, first keep it local and only expose once there’s two or more usages of the same pattern.

Also, documentation, docstrings for everything and manual entries before it can be merged.

@@ -0,0 +1,62 @@
{ pkgs }:
rec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every part of this attribute will be the public testing API as soon as it gets merged and will have to be supported for probably all eternity.
Big -1 from me for the current exposed symbols, there needs to be a lot more consensus first, and new (non-internal) symbols should only be introduced with care and properly documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, abstraction is currently done wrong, because there are not enough tests to generalise from. The comment-gathering re: what kinds of tests can be done seems to happen somehow.

Consensus won't happen anyway, but right now even consistency will be a step ahead.

What I did miss is that I haven't added [RFC] and [WIP] tags in the name to clarify the intent; fixed that.

Test LaTeX content

\end{document}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting the tests into folders might be more sensible, since test data can normally not be shared;

tests/
  zathura-latex/
    latex-file.tex
    default.nix

Once there are hundreds of tests, this will help with naming files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe having this test as a combined test was a mistake anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated tests for generation and viewing of PDFs, created some folders

@markuskowa
Copy link
Member

@7c6f434c Thanks, it looks pretty neat now. I will give checkAllExecutables a try with some of my derivations.

@7c6f434c 7c6f434c changed the title nixpkgs/tests: create nixos-independent PoC tests [RFC] [WIP] nixpkgs/tests: create nixos-independent PoC tests Mar 13, 2018
@7c6f434c
Copy link
Member Author

Re: two instances of the same pattern: unfortunately, most of the X stuff gets repeated even with two tests…

I do agree that the current API doesn't exist, neither is there any documentation of this non-existent object. This has to be fixed before merging. Still, I don't think Nixpgs manual entries should be added at this step, because of their high visibility — some period of use by early adopters should happen before advertising something in the manual.

@7c6f434c
Copy link
Member Author

Updated description to reflect a slightly wider variety. Suggestions welcome, implementation of suggestions not guaranteed.

@edolstra
Copy link
Member

@wmertens Except that the tests proposed here are system/integration tests, exactly like the existing VM infrastructure, including running X servers, doing OCR on output, running system daemons like dbus, etc. We're not talking about running make check which is already covered by checkPhase.

@7c6f434c
Copy link
Member Author

7c6f434c commented Mar 21, 2018 via email

@shlevy
Copy link
Member

shlevy commented Mar 29, 2018

@edolstra Thoughts on where/how to address this? Or do you think the existing NixOS tests are adequate?

@joepie91
Copy link
Contributor

joepie91 commented Jun 9, 2018

To pitch in: a short testing feedback loop is absolutely critical if you want to encourage a test-driven packaging process. Test durations should ideally be measured in seconds, and be short enough that they don't (mentally) interrupt whatever fixing work you're trying to do. It should be a matter of 'open terminal, run command, look at output, go back to fixing'.

The current bootstrapping time for any system test is simply too high to usefully test application functionality.

If we don't want deduplication, then perhaps we should look into possible ways to deduplicate the testing functionality between system tests and these proposed application integration tests; but that application integration tests are necessary as a third category is not really in question.

(EDIT: To be clear, I'm not saying that the system tests are not useful. But "test that everything works in the context of a NixOS system" and "test that particular functionality works when combined with particular other applications" are two distinctly different usecases, with different optimal tradeoffs.)

@nyarly
Copy link
Contributor

nyarly commented Jun 27, 2018

To throw in a completely different angle, I found https://github.com/NixOS/nixpkgs/blob/c4fc70f53cfaf673bde7fd009826d62bececa161/pkgs/development/ruby-modules/bundled-common/test.nix incredibly useful. Maybe that's because what I was writing was (too?) complex, but testing the actual Nix functions themselves was extremely quick, a very fast feedback cycle.

There are some (underappreciated, imo) testing utilities in lib/, but having something more robust and more widely accepted would be nifty.

@timokau
Copy link
Member

timokau commented Jul 14, 2018

@edolstra Is there something we (I) can do to make this experiment acceptable? It has many merits and apparently at least some community support. It also doesn't add very much complexity and can be seen as a community experiment for now. We could then schedule a re-visit of the matter after some time.

Or is this BDFL-blocked in any form?

@timokau timokau mentioned this pull request Jul 14, 2018
@grahamc
Copy link
Member

grahamc commented Jul 14, 2018

@timokau In these replies he hasn't said package tests are bad or undesirable, and provided feedback on ways to approach it which would be acceptable.

from #36842 (comment):

Duplicating the VM testing infrastructure in a crappy language (bash) that's significantly worse to maintain or write correct code in than Perl doesn't seem like a good idea to me.

Don't duplicate the functionality, especially in a pretty crappy language

Package tests should be kept close to the Nix expression for the package.

put the tests near the package

from #36842 (comment):

Closing this because having two testing infrastructures that do almost exactly the same thing but without sharing any code is really not something we should do.

make sure the package tests are differentiated enough that they're not a VM test in ad-hoc bash clothes

@7c6f434c
Copy link
Member Author

@grahamc as I mentioned, this set of change requirements is a permanent veto.

Rewriting in Python might be a good idea, even if Bash is still better than Perl (we do use Bash for wrappers even if they have Perl somewhere in the closure).

But:

  • Putting tests near the packages is not always a thing that can be defined.
  • GUI apps require launching an X session; so that it always duplicated functionality.
  • A full-VM NixOS test takes around a minute to start even if it just runs hello

If we want to avoid duplicating functionality, we need a NixOS profile (an empty one) that can be launched in Nix sandbox on non-NixOS Linux with full-sandboxing Nix in 3 seconds or less. Then we will add Xdummy etc. and get NixOS test that are usable for testing applications in minimal environment, not just to test the complicated and expensive NixOS features.

@timokau timokau mentioned this pull request Jul 15, 2018
9 tasks
@timokau
Copy link
Member

timokau commented Jul 15, 2018

How about we split this up. Parts of this may be uncontroversial and can be merged. Other parts could then be discussed in focus.

We could start with just the command line stuff. Maybe the binary checks in particular. I imagine those might be uncontroversial since

  • they are recommended in the current issue template and should be automated. They already are executed in an automated fashion by @r-ryantm and this is an opportunity to make that universal
  • they don't duplicate any functionality. It would definitely not make any sense to spin up a VM for that. You could implement that functionality as part of the regular installChecks, but that would require an unnecessary rebuild of all packages every time a test is changed.
  • no X functionality needed
  • they provide clear value right now
  • they make sense to implement in bash and little code is required in general

So we could start with that. After we're done with that, we can talk about other parts of this PR, in separation.

@grahamc @edolstra do you think that is a good idea?
@7c6f434c do you want to do that? If not, do you want me to do that?

@7c6f434c
Copy link
Member Author

7c6f434c commented Jul 15, 2018 via email

@timokau
Copy link
Member

timokau commented Jul 15, 2018 via email

@7c6f434c
Copy link
Member Author

7c6f434c commented Jul 15, 2018 via email

@timokau
Copy link
Member

timokau commented Jul 16, 2018 via email

@ryantm
Copy link
Member

ryantm commented Jul 16, 2018

@timokau There is a pretty big difference between a procedure that provides advisory information about a build and one that stops a build from completing, so I'm not sure the nixpkgs-update binary checking will be that interesting for this.

nixpkgs-update tries to execute every file in the bin directory with a bunch of different inputs, and it records how many files can get a non-zero exit code (with those inputs) and how many can output the new version string (with those inputs). I've learned I need to run the binaries with a strict timeout, but still there are a few I've had to blacklist because they get stuck during checking.

@timokau
Copy link
Member

timokau commented Jul 16, 2018 via email

@timokau
Copy link
Member

timokau commented Jul 20, 2018

Since this is stalling again, I'll probably start an initial implementation soon if nobody objects.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/adding-installcheck-to-packages/4317/2

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-many-people-are-paid-to-work-on-nix-nixpkgs/8307/40

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/wild-idea-how-about-splitting-nixpkgs-and-nixos/11487/3

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/contributor-retention/12412/21

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/contributor-retention/12412/27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet