-
-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] nixpkgs/tests: create nixos-independent PoC tests #36842
Conversation
cc @Profpatsch |
Would be possible to add a convenience function that tests simply if all binaries (like the checkpoint in the PR template) in $out/bin can be executed? For example let's say this function would check if the executable terminates without a segfault or dynamic linker errors. |
Sure. Added a bit of stuff: a combined test of |
Updated the description because of an additional couple of tests. |
@7c6f434c have you seen my talk from Nixcon? |
The |
@markuskowa sure, that's easy. A marginally more interesting question is how to make sure the program doesn't have a behaviour where it waits for an event to react without needing either input or output, a test like the current ones will just hang. On the other hand, there are also commands that will exit with 127 when passed |
@Profpatsch I don't normally watch video, but yes, I have read your slides before. In a way, this PR is orthogonal in a way: you want a way to run tests, I want a way to create them. You were willing to agree with VM tests, I want to avoid their overhead in a hope that faster tests are easier to debug and run, so there will be more of them. Imagine an auto-upgrade script that will be able claim «In 2017, half of the upgrades got strictly less testing — manual testing — than the amount we run automatically». I am not sure I completely believe in full feasibility of your maxim re: locality and unavoidability of tests, because I want to have tests like TeXLive + Zathura that cannot be ascribed to any single package, and an interoperability test is hard to assign to any one side (and in some other cases can also increase build closure beyond reason, even if Zathura happens to depend on TeXLive in this specific case). |
Added a note about the current need for large fonts to make OCR work. |
@7c6f434c Sure there needs to be some room for exceptions. Using |
@markuskowa if you look inside the implementation, you'll see there is — I had to ban |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I’d recommend a very conservative approach to this completely new API; rather than expose everything from the beginning, first keep it local and only expose once there’s two or more usages of the same pattern.
Also, documentation, docstrings for everything and manual entries before it can be merged.
tests/lib/default.nix
Outdated
@@ -0,0 +1,62 @@ | |||
{ pkgs }: | |||
rec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every part of this attribute will be the public testing API as soon as it gets merged and will have to be supported for probably all eternity.
Big -1 from me for the current exposed symbols, there needs to be a lot more consensus first, and new (non-internal) symbols should only be introduced with care and properly documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, abstraction is currently done wrong, because there are not enough tests to generalise from. The comment-gathering re: what kinds of tests can be done seems to happen somehow.
Consensus won't happen anyway, but right now even consistency will be a step ahead.
What I did miss is that I haven't added [RFC]
and [WIP]
tags in the name to clarify the intent; fixed that.
tests/data/latex-file.tex
Outdated
Test LaTeX content | ||
|
||
\end{document} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting the tests into folders might be more sensible, since test data can normally not be shared;
tests/
zathura-latex/
latex-file.tex
default.nix
Once there are hundreds of tests, this will help with naming files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe having this test as a combined test was a mistake anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated tests for generation and viewing of PDFs, created some folders
@7c6f434c Thanks, it looks pretty neat now. I will give |
Re: two instances of the same pattern: unfortunately, most of the X stuff gets repeated even with two tests… I do agree that the current API doesn't exist, neither is there any documentation of this non-existent object. This has to be fixed before merging. Still, I don't think Nixpgs manual entries should be added at this step, because of their high visibility — some period of use by early adopters should happen before advertising something in the manual. |
Updated description to reflect a slightly wider variety. Suggestions welcome, implementation of suggestions not guaranteed. |
Change layout a bit Separate tests for PDF generation and tests for PDF viewing; the latter use the outputs of the former.
@wmertens Except that the tests proposed here are system/integration tests, exactly like the existing VM infrastructure, including running X servers, doing OCR on output, running system daemons like dbus, etc. We're not talking about running |
Application integration tests provide only what application needs to run, not what NixOS needs to boot. The difference is a few times, with corresponding difference in speed and debugging annoyance.
|
@edolstra Thoughts on where/how to address this? Or do you think the existing NixOS tests are adequate? |
To pitch in: a short testing feedback loop is absolutely critical if you want to encourage a test-driven packaging process. Test durations should ideally be measured in seconds, and be short enough that they don't (mentally) interrupt whatever fixing work you're trying to do. It should be a matter of 'open terminal, run command, look at output, go back to fixing'. The current bootstrapping time for any system test is simply too high to usefully test application functionality. If we don't want deduplication, then perhaps we should look into possible ways to deduplicate the testing functionality between system tests and these proposed application integration tests; but that application integration tests are necessary as a third category is not really in question. (EDIT: To be clear, I'm not saying that the system tests are not useful. But "test that everything works in the context of a NixOS system" and "test that particular functionality works when combined with particular other applications" are two distinctly different usecases, with different optimal tradeoffs.) |
To throw in a completely different angle, I found https://github.com/NixOS/nixpkgs/blob/c4fc70f53cfaf673bde7fd009826d62bececa161/pkgs/development/ruby-modules/bundled-common/test.nix incredibly useful. Maybe that's because what I was writing was (too?) complex, but testing the actual Nix functions themselves was extremely quick, a very fast feedback cycle. There are some (underappreciated, imo) testing utilities in |
@edolstra Is there something we (I) can do to make this experiment acceptable? It has many merits and apparently at least some community support. It also doesn't add very much complexity and can be seen as a community experiment for now. We could then schedule a re-visit of the matter after some time. Or is this BDFL-blocked in any form? |
@timokau In these replies he hasn't said package tests are bad or undesirable, and provided feedback on ways to approach it which would be acceptable. from #36842 (comment):
Don't duplicate the functionality, especially in a pretty crappy language
put the tests near the package from #36842 (comment):
make sure the package tests are differentiated enough that they're not a VM test in ad-hoc bash clothes |
@grahamc as I mentioned, this set of change requirements is a permanent veto. Rewriting in Python might be a good idea, even if Bash is still better than Perl (we do use Bash for wrappers even if they have Perl somewhere in the closure). But:
If we want to avoid duplicating functionality, we need a NixOS profile (an empty one) that can be launched in Nix sandbox on non-NixOS Linux with full-sandboxing Nix in 3 seconds or less. Then we will add Xdummy etc. and get NixOS test that are usable for testing applications in minimal environment, not just to test the complicated and expensive NixOS features. |
How about we split this up. Parts of this may be uncontroversial and can be merged. Other parts could then be discussed in focus. We could start with just the command line stuff. Maybe the binary checks in particular. I imagine those might be uncontroversial since
So we could start with that. After we're done with that, we can talk about other parts of this PR, in separation. @grahamc @edolstra do you think that is a good idea? |
- they are recommended in the current issue template and should be automated. They already are executed in an automated fashion by @r-ryantm and this is an opportunity to make that universal
Well, they do require some judgement as recommended and as done by the update bot.
I have _some_ as examples in the PR; I probably want to see a list of the checks you propose to make available.
Also, where it should live? I dunno, build-support?
- they provide clear value right now
I think for the, to provide value we need some clear scheme with some flexibility
Possible options:
* exits with 0, exits with 0–k code, in particular: exits with 0–126 code (non-signal exit)
* started with no arguments, started with --help, started with --version
* prints something, prints at least/at most N lines, prints something that includes the version of the package
* all, at least one, all but a known list, just in a known list
Which do we want? Would it be too heavy for bash if we implement all? For Python?
|
>- they are recommended in the current issue template and should be automated. They already are executed in an automated fashion by @r-ryantm and this is an opportunity to make that universal
Well, they do require some judgement as recommended and as done by the update bot.
Yes we will still need to customize that. Make it possible to overwrite which binary should respond to which flag with which exit code / output. But we should be able to provide sane defaults (exit code 0 to --help? @ryantm could probably help with the details). Then we would have to decide weather to make that opt-in or opt-out.
I have _some_ as examples in the PR; I probably want to see a list of the checks you propose to make available.
I'd say pretty much exactly what your current `check-executables.nix` does.
Also, where it should live? I dunno, build-support?
I think `tests` makes sense. But `build-support` is fine too. Thats just details.
>- they provide clear value right now
I think for the, to provide value we need some clear scheme with some flexibility
Possible options:
* exits with 0, exits with 0–k code, in particular: exits with 0–126 code (non-signal exit)
* started with no arguments, started with --help, started with --version
* prints something, prints at least/at most N lines, prints something that includes the version of the package
* all, at least one, all but a known list, just in a known list
Yes, those are defaults we should decide on in that PR. Probably best to start really simple: "`--help` exists 0?" and go from there.
Which do we want? Would it be too heavy for bash if we implement all? For Python?
We could just as well implement that with python. Although for passing arguments to command line tools and checking their exit code, python is probably more clumsy than bash. It'd work though.
|
> Also, where it should live? I dunno, build-support?
I think `tests` makes sense. But `build-support` is fine too. Thats just details.
Well, /tests might have bad luck, especially as it doesn't test anything.
We could just as well implement that with python. Although for passing arguments to command line tools and checking their exit code, python is probably more clumsy than bash. It'd work though.
I think we need to implement _some_ customisation from the beginning (so my question is which options and how we want to customise).
Implementation options are bash, python and generating trivial bash by nontrivial Nix. Pythong might be better for handling per-executable policies, but it is an extra dependency and yes it might be a bit more verbose for simple option passing.
|
On 18-07-15 14:16, Michael Raskin wrote:
>> Also, where it should live? I dunno, build-support?
>I think `tests` makes sense. But `build-support` is fine too. Thats just details.
Well, /tests might have bad luck, especially as it doesn't test anything.
I don't understand that point. Of course it tests something?
I would imagine the "infrastructure" and defaults to live in `tests` and the per-package customization in the per-package directory (following "tests close to source").
>We could just as well implement that with python. Although for passing arguments to command line tools and checking their exit code, python is probably more clumsy than bash. It'd work though.
I think we need to implement _some_ customisation from the beginning (so my question is which options and how we want to customise).
Again, I think @ryantm probably has some insights here. We might start by manually adding tests to a few packages as an experiment (opt-in). Then maybe a trust-on-first-use-esque principle could be implemented for @r-ryantm. It could execute its binary tests like it does now and then add tests for the ones that pass.
Implementation options are bash, python and generating trivial bash by nontrivial Nix. Pythong might be better for handling per-executable policies, but it is an extra dependency and yes it might be a bit more verbose for simple option passing.
`check-executables.nix` seems to use the second option and is perfectly fine in my opinion. We could of course switch implementation language if that seems to make sense at a later point -- its just a couple of lines.
|
@timokau There is a pretty big difference between a procedure that provides advisory information about a build and one that stops a build from completing, so I'm not sure the nixpkgs-update binary checking will be that interesting for this. nixpkgs-update tries to execute every file in the |
On 18-07-16 06:27, Ryan Mulligan wrote:
@timokau There is a pretty big difference between a procedure that provides advisory information about a build and one that stops a build from completing, so I'm not sure the nixpkgs-update binary checking will be that interesting for this.
nixpkgs-update tries to execute every file in the `bin` directory with a [bunch of different inputs](https://github.com/ryantm/nixpkgs-update/blob/5cf1bc67aa1165ab44df1091df84107688d4253e/src/Check.hs#L54), and it records how many files can get a non-zero exit code (with those inputs) and how many can output the new version string (with those inputs). I've learned I need to run the binaries with a [strict timeout](https://github.com/ryantm/nixpkgs-update/blob/5cf1bc67aa1165ab44df1091df84107688d4253e/src/Check.hs#L43), but still there are a few I've had to blacklist because they [get stuck during checking](https://github.com/ryantm/nixpkgs-update/blob/5cf1bc67aa1165ab44df1091df84107688d4253e/src/Blacklist.hs#L97).
Thanks for your input! I think the only difference is that currently the bot just runs all tests on all binaries. The checks talked about here would have to be a bit more selective.
|
Since this is stalling again, I'll probably start an initial implementation soon if nobody objects. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/adding-installcheck-to-packages/4317/2 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/how-many-people-are-paid-to-work-on-nix-nixpkgs/8307/40 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/wild-idea-how-about-splitting-nixpkgs-and-nixos/11487/3 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/contributor-retention/12412/21 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/contributor-retention/12412/27 |
for
libreoffice
,maxima
,texlive
,imagemagick
,zathura
,evince
,youtube-dl
Motivation for this change
Some things in Nixpkgs are testable via NixOS test, which is good.
But NixOS tests are an integration test framework for an OS. So they need a system path with a lot of things, a long list of system services, require
/dev/kvm
access, boot a full NixOS instance which is not instant — especially in KVM, have significant overhead even afterwards and a lot of sources of non-determinism and hard-to-track state transitions.Also, they currently only support keyboard input.
What I present is a sample of how simple minimally useful tests «nothing obvious is broken» can be.
Highlights:
xdummy
, if I don't — even better.nix-shell
.fmbt
has an acceptable interface for OCR-driven mouse-based GUI testing.Not done — more examples:
default.nix
.maxima.nix
is not a test for Maxima — it is a test for SBCL. How this should be reflected? This is a large an complicated topic, and NixOS tests could be the first goal anyway.Things done
build-use-sandbox
innix.conf
on non-NixOS)Tested via one or more NixOS test(s) if existing and applicable for the changeEmphatically not applicable